Goto

Collaborating Authors

 Tunisia


Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

Neural Information Processing Systems

The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBMs on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBMs reach the optimal computational weak recovery threshold, aligning with the Baik-Ben Arous-Péché (BBP) transition, in the spiked covariance model.


Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks

Neural Information Processing Systems

We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expression for the population loss in terms of a pair of sufficient statistics capturing semantic and positional alignment, and characterize the induced high-dimensional SGD dynamics for these coordinates. Our analysis reveals two distinct training phases: escape from uninformative initialization and alignment with the target subspace, and demonstrates how the sequence length and positional encoding influence convergence speed and learning trajectories. These results provide a rigorous and interpretable foundation for understanding how sequential structure in data can be beneficial for learning with attention-based models. Stochastic Gradient Descent (SGD) is the core optimization tool driving modern machine learning. Recent years have seen substantial progress in understanding its dynamics, particularly in two-layer networks [Saad and Solla, 1995, Mei et al., 2018, Chizat and Bach, 2018, Rotskoff and VandenEijnden, 2022, Sirignano and Spiliopoulos, 2020, Arnaboldi et al., 2023a]. While global convergence is qualitatively well-understood when the network is wide enough, quantitative results are scarcer. A particularly fruitful body of recent theoretical work addressing this gap has focused on deriving precise convergence rates for particular model classes on synthetic data, such as high-dimensional Gaussian single and multi-index models [Ben Arous et al., 2021, Abbe et al., 2022, 2023].


Optimal Spectral Transitions in High-Dimensional Multi-Index Models

Neural Information Processing Systems

We consider the problem of how many samples from a Gaussian multi-index model are required to weakly reconstruct the relevant index subspace. Despite its increasing popularity as a testbed for investigating the computational complexity of neural networks, results beyond the single-index setting remain elusive. In this work, we introduce spectral algorithms based on the linearization of a message passing scheme tailored to this problem. Our main contribution is to show that the proposed methods achieve the optimal reconstruction threshold. Leveraging a high-dimensional characterization of the algorithms, we show that above the critical threshold the leading eigenvector correlates with the relevant index subspace, a phenomenon reminiscent of the Baik-Ben Arous-Peche (BBP) transition in spiked models arising in random matrix theory.


Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

Neural Information Processing Systems

The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBMs on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBMs reach the optimal computational weak recovery threshold, aligning with the Baik-Ben Arous-Péché (BBP) transition, in the spiked covariance model.


World's shark attack hotspots revealed: As a great white is spotted in the Mediterranean, experts reveal the areas where you're most likely to be bitten

Daily Mail - Science & tech

'Record the faces': Tense moment NBA boss gives VERY honest take on Trump attending Knicks game Leaked transcript of UNAIRED 60 Minutes interview exposes REAL reason'callous' CBS star Scott Pelley'deserved to be fired' Disgraceful texts'hot' teacher sent boy, 17, who she had illegal sex with where she moaned about her HUSBAND Everyone always said I cleared my throat a lot. But then I developed shoulder pain and doctors discovered the sinister cause... the world's deadliest cancer. Don't leave it too late like I did Outrage as Netanyahu is caught SPYING on Trump's Iran negotiators... as JD Vance reveals a chilling truth about Israel White couple gave birth to'non-Caucasian' baby. Parents were told son, 7, had ADHD... not realizing he was battling terrifying disease that has now left him BLIND'Great' mom, 32, tried to gas herself and her three young kids to death after inviting them to'popcorn sleepover' in car, prosecutors allege Medical student, 24, died by suicide in his white coat a day after he was suspended for alleged'inappropriate' behavior towards female patient, lawsuit alleges, as his heartbreaking goodbye note to parents is revealed Karmelo Anthony's parents seen leaving the courtroom in tears just before son's defense team pulls shock move Grim-faced former Louisiana mayor Misty Roberts arrives in court for sentencing after being found guilty of having sex with son's teenage friend Mother died during tummy tuck and Brazilian butt lift after clinic staff failed to hold'slow' elevator for EMTs, report alleges Gaming influencer Alex Cimo dies'very suddenly' aged 32 just a month after'refusing to accept his fate' The porn-fuelled fantasy middle-class husbands are desperate to try with their wives... and it almost always ends in divorce: JANA HOCKING All the backstage gossip from Miami Swim Week: Insider exposes'catty' VIP's diva demands... STEALING... and'morbidly embarrassing' celeb moment everyone is whispering about Girl, 13, mistakenly told she was DYING after Oregon hospital staff made jaw-dropping surgical mistake, parents' $17m lawsuit alleges Mother's final words before she was shot dead'by new husband' in front of her two young children'They have a problem with my country': Africa's best referee, who was denied entry to the US and will miss the World Cup, speaks out and insists he had a valid visa Furious dad films his partner in bed with his 19-year-old son: You've seen the viral video - now all three tell the Daily Mail what REALLY happened in the scandal gripping Australia World's shark attack hotspots revealed: As a great white is spotted in the Mediterranean, experts reveal the areas where you're most likely to be bitten The world's shark attack hotspots have been revealed, after a g reat white shark was spotted in the Mediterranean Sea. The enormous predator was recorded between Sicily and Tunisia, in what is believed to be the first ever footage captured of an adult great white in the area.


Great white shark is recorded underwater in the Mediterranean for the first time ever

Daily Mail - Science & tech

'Record the faces': Tense moment NBA boss gives VERY honest take on Trump attending Knicks game Leaked transcript of UNAIRED 60 Minutes interview exposes REAL reason'callous' CBS star Scott Pelley'deserved to be fired' Disgraceful texts'hot' teacher sent boy, 17, who she had illegal sex with where she moaned about her HUSBAND Everyone always said I cleared my throat a lot. But then I developed shoulder pain and doctors discovered the sinister cause... the world's deadliest cancer. Don't leave it too late like I did Outrage as Netanyahu is caught SPYING on Trump's Iran negotiators... as JD Vance reveals a chilling truth about Israel White couple gave birth to'non-Caucasian' baby. Parents were told son, 7, had ADHD... not realizing he was battling terrifying disease that has now left him BLIND Medical student, 24, died by suicide in his white coat a day after he was suspended for alleged'inappropriate' behavior towards female patient, lawsuit alleges, as his heartbreaking goodbye note to parents is revealed Karmelo Anthony's parents seen leaving the courtroom in tears just before son's defense team pulls shock move Grim-faced former Louisiana mayor Misty Roberts arrives in court for sentencing after being found guilty of having sex with son's teenage friend Mother died during tummy tuck and Brazilian butt lift after clinic staff failed to hold'slow' elevator for EMTs, report alleges Gaming influencer Alex Cimo dies'very suddenly' aged 32 just a month after'refusing to accept his fate' 'Great' mom, 32, tried to gas herself and her three young kids to death after inviting them to'popcorn sleepover' in car, prosecutors allege The porn-fuelled fantasy middle-class husbands are desperate to try with their wives... and it almost always ends in divorce: JANA HOCKING Meghan Markle's As Ever website has had'less than 400,000 US visitors' since January - as Duchess launches collaboration with a lifestyle influencer to plug her products Nashville's most-hated influencer sparked outrage with sick posts about teen girl who vanished into the woods after a party... now his incredible life of luxury is unraveling Girl, 13, mistakenly told she was DYING after Oregon hospital staff made jaw-dropping surgical mistake, parents' $17m lawsuit alleges Mother's final words before she was shot dead'by new husband' in front of her two young children'They have a problem with my country': Africa's best referee, who was denied entry to the US and will miss the World Cup, speaks out and insists he had a valid visa Massive twist in JPMorgan'sex slave' case as accuser unveils NEW dossier of wild claims: 'The story is about to change dramatically' A great white shark has been spotted underwater in the Mediterranean for the first time ever. Divers from Healthy Seas were removing ghost nets on an offshore shipwreck between Sicily and Tunisia when they spotted the predator.


There Will Be a Scientific Theory of Deep Learning

arXiv.org Machine Learning

In this paper, we make the case that a scientific theory of deep learning is emerging. By this we mean a theory which characterizes important properties and statistics of the training process, hidden representations, final weights, and performance of neural networks. We pull together major strands of ongoing research in deep learning theory and identify five growing bodies of work that point toward such a theory: (a) solvable idealized settings that provide intuition for learning dynamics in realistic systems; (b) tractable limits that reveal insights into fundamental learning phenomena; (c) simple mathematical laws that capture important macroscopic observables; (d) theories of hyperparameters that disentangle them from the rest of the training process, leaving simpler systems behind; and (e) universal behaviors shared across systems and settings which clarify which phenomena call for explanation. Taken together, these bodies of work share certain broad traits: they are concerned with the dynamics of the training process; they primarily seek to describe coarse aggregate statistics; and they emphasize falsifiable quantitative predictions. We argue that the emerging theory is best thought of as a mechanics of the learning process, and suggest the name learning mechanics. We discuss the relationship between this mechanics perspective and other approaches for building a theory of deep learning, including the statistical and information-theoretic perspectives. In particular, we anticipate a symbiotic relationship between learning mechanics and mechanistic interpretability. We also review and address common arguments that fundamental theory will not be possible or is not important. We conclude with a portrait of important open directions in learning mechanics and advice for beginners. We host further introductory materials, perspectives, and open questions at learningmechanics.pub.


Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps

arXiv.org Machine Learning

The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.


Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

arXiv.org Machine Learning

Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.


Information-Geometric Decomposition of Generalization Error in Unsupervised Learning

arXiv.org Machine Learning

We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance. The decomposition is exact for any e-flat model class and follows from two identities of information geometry: the generalized Pythagorean theorem and a dual e-mixture variance identity. As an analytically tractable demonstration, we apply the framework to $ε$-PCA, a regularized principal component analysis in which the empirical covariance is truncated at rank $N_K$ and discarded directions are pinned at a fixed noise floor $ε$. Although rank-constrained $ε$-PCA is not itself e-flat, it admits a technical reformulation with the same total GE on isotropic Gaussian data, under which each component of the decomposition takes closed form. The optimal rank emerges as the cutoff $λ_{\mathrm{cut}}^{*} = ε$ -- the model retains exactly those empirical eigenvalues exceeding the noise floor -- with the cutoff reflecting a marginal-rate balance between model-error gain and data-bias cost. A boundary comparison further yields a three-regime phase diagram -- retain-all, interior, and collapse -- separated by the lower Marchenko--Pastur edge and an analytically computable collapse threshold $ε_{*}(α)$, where $α$ is the dimension-to-sample-size ratio. All claims are verified numerically.